Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers

نویسندگان

چکیده

The massive amounts of digitized historical documents acquired over the last decades naturally lend themselves to automatic processing and exploration. Research work seeking automatically process facsimiles extract information thereby are multiplying with, as a first essential step, document layout analysis. If identification categorization segments interest in images have seen significant progress years thanks deep learning techniques, many challenges remain among others, use finer-grained segmentation typologies consideration complex, heterogeneous such newspapers. Besides, most approaches consider visual features only, ignoring textual signal. In this context, we introduce multimodal approach for semantic newspapers that combines features. Based on series experiments diachronic Swiss Luxembourgish newspapers, investigate, predictive power their capacity generalize across time sources. Results show consistent improvement models comparison strong baseline, well better robustness high material variance.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Combining Textual and Visual Features for Image Retrieval

This paper presents the approaches used by the MIRACLE team to image retrieval at ImageCLEF 2005. Text-based and content-based techniques have been tested, along with combination of both types of methods to improve image retrieval. The text-based experiments defined this year try to use semantic information sources, like thesaurus with semantic data or text structure. On the other hand, content...

متن کامل

Page Stream Segmentation with Convolutional Neural Nets Combining Textual and Visual Features

In recent years, (retro-)digitizing paper-based files became a major undertaking for private and public archives as well as an important task in electronic mailroom applications. As a first step, the workflow involves scanning and Optical Character Recognition (OCR) of documents. Preservation of document contexts of single page scans is a major requirement in this context. To facilitate workflo...

متن کامل

Combining Textual and Visual Clusters for Semantic Image Retrieval and Auto-annotation

In this paper, we propose a novel strategy at an abstract level by combining textual and visual clustering results to retrieve images using semantic keywords and auto-annotate images based on similarity with existing keywords. Our main hypothesis is that images that fall in to the same textcluster can be described with common visual features of those images. In this approach, images are first c...

متن کامل

UNAL-NLP: Combining Soft Cardinality Features for Semantic Textual Similarity, Relatedness and Entailment

This paper describes our participation in the SemEval-2014 tasks 1, 3 and 10. We used an uniform approach for addressing all the tasks using the soft cardinality for extracting features from text pairs, and machine learning for predicting the gold standards. Our submitted systems ranked among the top systems in all the task and sub-tasks in which we participated. These results confirm the resul...

متن کامل

Towards Semantic Enrichment of Newspapers: A Historical Ecology Use Case

Historical ecology research relies on historical accounts of human-animal interactions to study this interaction through space and time. Newspaper archives are a rich source of information, but require careful querying and filtering to collect the relevant information. Traditionally, this is a laborious manual task. In this position paper, we describe our ongoing work on semantically enriching ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Journal of Data Mining and Digital Humanities

سال: 2021

ISSN: ['2416-5999']

DOI: https://doi.org/10.46298/jdmdh.6107